Comprehensible exploratory induction with decision graphs
نویسندگان
چکیده
This article addresses the problem of the induction of comprehensible hypotheses on complex domains. It proposes a hypothesis language (a type of decision graph) and an inductive algorithm (PASTEUR) which naturally induces structured rules. The advantages over other hypothesis languages in terms of comprehensibility are reviewed as well as the possibility given to represent explicitly exceptions, and some basic features of the algorithm are presented. The experimental evaluation is done on two classical ML databases from UCI Irvine. A — Introduction The Machine Learning community has been traditionally concerned with the performances of learning algorithms in terms of prediction accuracy. But the importance of the format and the nature of the results produced has been also more and more recognized as essential to guarantee a good readability, and consequently, to permit an efficient use of the output of the algorithm. This is particularly true in exploratory induction, in which the model induced is not used in priority for its predictive power, but rather as an analysis tool on a complex domain. Exploratory induction finds as a major application area the field of machine-aided discovery [Corruble & Ganascia, 1993,94]. In that respect, the induction of rule bases has shown a great potential in different systems because a logical implication is cognitively manageable and also because it is a natural input to classical expert systems based on inference engines. However, experience on real problems shows that the evaluation and understanding of large rule bases is highly problematic. When more than one or two dozens of rules are induced on a complex domain, it becomes nearly impossible to get a global idea of their meaning. The results obtained are therefore difficult to interpret and the comprehension of the domain explored is not improved, even though, from the syntactical point of view, the desired concepts can be learned with a good accuracy. It is well known that the quality of the results can be improved by a refinement of the learning bias. However, if such a task is to be undertaken, some link must be made between the bias and the output, and this link must take into account the semantics of the domain. It is therefore very difficult to improve the learning bias because of the blurred semantics of the output. One of the main contributions of the approach presented here is to propose an hypothesis space that preserves the main advantages of rule bases but yet structure them in largely independent entities which have their own coherence and which therefore favors human analysis. A subsequent advantage of this hypothesis language lies in the possibility of inducing decision graphs which can accommodate the presence of exceptions. Different graph structures of interest can be obtained. For instance, the algorithm can induce logical expressions which introduce indirectly the concept of negation, though this one is never present explicitly in the description language and the search space. This increased representational power permits to induce relatively simple and comprehensible hypotheses describing a complex phenomenon. The first section of this article review a number of competing hypothesis languages based on rules, points out their specific advantages, and analyses their behavior when the learning and domain complexity increases. The second section introduces the hypothesis language that we propose and the inductive algorithm that uses it. Some experimental results are given in the last section. B — A critical review of existing hypothesis language In this section we review one of the most commonly used hypothesis languages of symbolic machine learning, namely rule bases, and evaluate it and some of its extensions in terms of comprehensibility, first in the general case, and then moving along the axis of increasing complexity, and of increasing uncertainty. Rule bases have strong advantages as an hypothesis language for machine learning. The most obvious one is that they can easily be expressed in natural language. The logical implication thus provides a self-contained semantics that does not need any further interpretation to be cognitively manageable. Numerous theoretical studies and existing systems, like MYCIN [Shortliffe, E.H. 1976], have proved this advantage to be real. In the same line of idea, the term of modularity has been used to described the advantage of rule systems [Heckerman, D. E. & Horvitz, E. J. 1988]. What happens to the modularity of rule systems when the complexity of the domain increases ? One who has induced rules on real complex problems has observed that the number of rules induced increases, and the proportion of the training set explained by each rule tends to go down, so that each rule becomes a "micro-theory" covering only very specific cases: the link between the different rules becomes very blurred. So the advantage of modularity attributed to rule bases fades when inductive algorithms are confronted to real complex problems. The use of rule systems in non-deterministic domains leads to the problem of deciding which rule to activate among all the ones whose premises are satisfied by a new example. A first answer to this problem is the use of decision tables, as for example in [Kohavi, R. 1995]. A decision table constitutes a simple way of avoiding the problem of choice among the set of rules since it actually activates all of them, and proceeds afterwards to a vote to choose the class which has been predicted by the highest number of rules. Decision tables are very flexible but they introduce a statistical component in the decision process which is difficult to interpret semantically in the domain studied at an exploratory stage, especially when the number of rules participating in the vote gets high. A rule taken individually (independently from the other rules of the decision table) has actually no value, so that decision tables addresses the issue of multiple activation by giving up the syntactic modularity of rule systems. It is also obvious that the lack of semantic modularity observed in big rule systems becomes even stronger in the case of decision tables. The same problem is also addressed by some systems inducing decision lists (for example CN2 [Clark P. & Niblett T., 1989]). A decision list is a structure studied formally by Rivest [Rivest, 1987] and is similar to the default hierarchies introduced by John Holland in the field of genetic machine learning (Holland et al, 1987). This structure introduces a simple version of default logic in which the rule activated is the first member of the list whose premise is satisfied and whose successor does not have its premises satisfied. The possibility of inducing structured default rules is a great improvement both to represent examples as exceptions to general rules, but also in terms of comprehensibility of the output. Decision lists are more comprehensible because their structure is more similar to the one a human would use if confronted to the same problem, by trying to jump to the conclusion as soon as possible. One problem with decision lists is that the linear structure of the list is too much constrained, both in terms of syntactical flexibility, and by comparison with the way humans approach a problem. Anyone observing a scientist or a domain expert looking at experimental data realizes his/her analysis and hypothesis formation process is far from linear and can more or less explicitly use highly complex structures which are not reducible to an ordered list.
منابع مشابه
Rule Induction as Exploratory Data
This paper examines induction of decision rules for purposes of exploratory data analysis , and presents various tools and techniques for this. Decision tables provide a compact, consistent format that allows several fairly large classiiers to be examined on a page. Variation in rulesets arises naturally to a surprisingly large degree from small diierences in sampling from a training set, and e...
متن کاملWeb Categorisation Using Distance-Based Decision Trees
In Web classification, web pages are assigned to pre-defined categories mainly according to their content (content mining). However, the structure of the web site might provide extra information about their category (structure mining). Traditionally, both approaches have been applied separately, or are dealt with techniques that do not generate a model, such as Bayesian techniques. Unfortunatel...
متن کاملRule Extraction from Ensemble Methods Using Aggregated Decision Trees
Ensemble methods have become very well known for being powerful pattern recognition algorithms capable of achieving high accuracy. However, Ensemble methods produces learners that are not comprehensible or transferable thus making them unsuitable for tasks that require a rational justification for making a decision. Rule Extraction methods can resolve this limitation by extracting comprehensibl...
متن کاملVisualizing the Simple Bayesian Classiier
The simple Bayesian classiier (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classiication models even when there are clear conditional dependencies. The SBC can serve as an excellent tool fo...
متن کاملVisualizing the Simple Bayesian Classi
The simple Bayesian classi er (SBC), sometimes called Naive-Bayes, is built based on a conditional independence model of each attribute given the class. The model was previously shown to be surprisingly robust to obvious violations of this independence assumption, yielding accurate classi cation models even when there are clear conditional dependencies. The SBC can serve as an excellent tool fo...
متن کامل